Analysis and Synthesis of Agents That Learn from Distributed Dynamic Data Sources

نویسندگان

  • Doina Caragea
  • Adrian Silvescu
  • Vasant Honavar
چکیده

We propose a theoretical framework for specification and analysis of a class of learning problems that arise in open-ended environments that contain multiple, distributed, dynamic data and knowledge sources. We introduce a family of learning operators for precise specification of some existing solutions and to facilitate the design and analysis of new algorithms for this class of problems. We state some properties of instance and hypothesis representations, and learning operators that make exact learning possible in some settings. We also explore some relationships between models of learning using different subsets of the proposed operators under certain assumptions. 1 Learning from Distributed Dynamic Data Many practical knowledge discovery tasks (e.g., learning the behavior of complex computer systems from observations, computer-aided scientific discovery in bioinformatics) present several new challenges in machine learning. The data repositories in such applications tend to be very large, physically distributed, often autonomously managed, and constantly growing over time (as new data get added). Thus, there is a need for algorithms for learning from distributed data by analysing the distributed data sets where they reside instead of shipping large volumes of data across networks, in an incremental fashion, as the data becomes available over time, without having to reprocess the already processed data [5,14]. Although some incremental and distributed learning algorithms have been proposed in the literature, most of them [9,4,15], do not guarantee generalization accuracies that are provably close to those obtainable in the batch or centralized learning scenario. Some notable exceptions include parallel and distributed versions [1,13,6,12] and incremental versions [3] of batch algorithms that preserve the underlying nature of the centralized algorithm. At present, with the exception of some interesting results (e.g., mistake bounds) for the closely related problem of online learning [7], a characterization of hypothesis classes that admit efficient exact or approximate distributed or incremental learning is lacking. Yet from a practical standpoint, the design and implementation of such learning S. Wermter et al. (Eds.): Emergent Neural Computational Architectures, LNAI 2036, pp. 547–559, 2001. c © Springer-Verlag Berlin Heidelberg 2001 548 D. Caragea, A. Silvescu, and V. Honavar agents is clearly of interest. Against this background, there is a need to address incremental and distributed learning problems in their full generality. This paper presents some tentative steps towards a framework for specification, analysis, and synthesis of incremental and distributed learning agents. We define some learning and information extraction operators to formally model some existing learning algorithms. We explore some properties of instance and hypothesis representations, and learning operators that guarantee the existence of incremental and distributed learning algorithms with provable performance guarantees relative to their batch or centralized counterparts. We offer some examples to illustrate the use of this theoretical framework in designing new incremental and distributed learning algorithms. 2 Incremental Learning and Distributed Learning A generic incremental learning scenario is shown in Fig. 1. In an incremental learning scenario, data sets D1, D2, · · · , Dn are assumed to become available to the learner at discrete instants in time t1, t2, · · · , tn. The learner starts with a (possibly null) initial hypothesis h0 which constitutes the prior knowledge of the domain. We assume that the learner is typically unable to store the data in its raw form. Thus, it can only maintain and update its hypothesis base as new data becomes available. Thus, h0 gets updated to h1 on the basis of D1, and h1 gets updated to h2 on the basis of data D2, and so on. In a distributed learning scenario, the data set is assumed to be physically distributed across multiple, possibly autonomous, data repositories D1, · · · , Dn. The learner can visit the repositories to gather the information necessary for generating knowledge (e.g., in the form of pattern classification rules) by processing the data where it is stored. Alternatively, the data repositories may transmit the information to the learner. In either case, we prohibit transport of raw data among different sites. A distributed learning scenario is shown in Fig. 2. A number of variations on these basic incremental and distributed learning scenarios can be envisioned under different assumptions concerning where and when data processing is performed, what information is made available to the learner, etc. More generally, we can consider incremental learning from distributed data sources. Space does not permit a detailed discussion of such scenarios.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Toward a Theoretical Framework for Analysis and Synthesis of Agents that Learn from Distributed Dynamic Data Sources

! " # $ % & % ' ( ) * % ,+ $" . / 0 $ 1 $ $ ' $023% ( 4) 5 6 -% 7$ 28 &/ " ) $"4 5 9 5 , 123$ : " $ 28 ! 7 $" 4) ;3<= -& $ " >! &/ 23$" 4 ) "%/ + ?023$ @ $ <8 A * % $ B2C $ 4D49%! & E ! > % GF H ( 40 ) ( 9 $ % I8J1& $ &/$" ) 0 $" 5 2C 4) 4) ( : ) 7 % Q $" 4: I.J0 $ % R S2T 40 , HQ$ 2 5 .$"&/ $" '23$ & L ) &/ ,M/ $ 7$ 28 $"4) 9 U $ % $" V...

متن کامل

Agents that Learn from Distributed Dynamic Data Sources

Doina Caragea Artificial Intelligence Research Laboratory, Department of Computer Science, Iowa State University, Ames, IA 50011 USA [email protected] Adrian Silvescu Artificial Intelligence Research Laboratory, Department of Computer Science, Iowa State University, Ames, IA 50011 USA [email protected] Vasant Honavar Artificial Intelligence Research Laboratory, Department of Compute...

متن کامل

Gnidilatimonoein from Daphne mucronata inhibits DNA synthesis in human cancer cell lines

The anticancer agents from plant sources usually exert their action through a wide range of mechanisms. As part of our studies of plants for new anticancer agents with emphasis on Thymelaeaceae family, we examined the cytotoxicity and anti-tumor activity of the water extract of D. mucronata leaves against induced breast tumor in rats. In the current study, we were interested to obtain some know...

متن کامل

Gnidilatimonoein from Daphne mucronata inhibits DNA synthesis in human cancer cell lines

The anticancer agents from plant sources usually exert their action through a wide range of mechanisms. As part of our studies of plants for new anticancer agents with emphasis on Thymelaeaceae family, we examined the cytotoxicity and anti-tumor activity of the water extract of D. mucronata leaves against induced breast tumor in rats. In the current study, we were interested to obtain some know...

متن کامل

Synthesis of Some Benzofuran Derivatives Containing Pyrimidine Moiety as Potent Antimicrobial Agents

In this investigation, the synthesis of 2-substituted pyrimidines by the reaction of benzofuranchalcones (3a-d) with urea, thiourea and guanidine hydrochloride was reported. The structuresof title compounds (4a-d), (5a-d) and (6a-d) were established on the basis of analyticaland spectral data. The synthesized compounds were screened for antimicrobial activityand molecular docking studies. Some ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001